Embedding IMDI Metadata into a Large Phonetic Corpus

نویسندگان

  • Oliver Schonefeld
  • Jan-Torsten Milde
چکیده

The paper shows the set up of a large phonetic corpus (the LeaP corpus), how its metadata is structured and transformed into an extended IMDI/ISLE metadata structure, how this structure has been transcoded into the TASX metadata format and finally has been intergrated into the LeaP corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Profiles for IMDI Metadata Creation

In this paper a system to support the creation of extended IMDI metadata records is presented. It is based on bundling definitions of the in the IMDI system user definable key-name/value pairs in a profile. The possibility of using inheritance of profiles in a corpus structure is explored. Profiles Can be created and used by the IMDI Editor, a tool specially designed to create IMDI metadata rec...

متن کامل

memasysco: XML schema based metadata management system for speech corpora

The metadata management system for speech corpora “memasysco” has been developed at the Institut für Deutsche Sprache (IDS) and is applied for the first time to document the speech corpus “German Today”. memasysco is based on a data model for the documentation of speech corpora and contains two generic XML schemas that drive data capture, XML native database storage, dynamic publishing, and inf...

متن کامل

A Large Metadata Domain of Language Resources

The INTERA and ECHO projects were partly intended to create a critical mass of open and linked metadata descriptions of language resources, helping researchers to understand the benefits of an increased visibility of language resources in the Internet and motivating them to participate. The work was based on the new IMDI version 3.0.3 which is a result of experiences with the earlier versions a...

متن کامل

LAMUS: the Language Archive Management and Upload System

LAMUS is a web-based service that allows researchers to deposit their language resources into a language resources archive. It was developed at the MPI for Psycholinguistics for stricter control of the archive coherence and consistency and allowing wider use of the archiving facilities without increasing the workload for archive and corpus managers. LAMUS is based on the use of IMDI metadata st...

متن کامل

Generating Usable Formats for Metadata and Annotations in a Large Meeting Corpus

The AMI Meeting Corpus is now publicly available, including manual annotation files generated in the NXT XML format, but lacking explicit metadata for the 171 meetings of the corpus. To increase the usability of this important resource, a representation format based on relational databases is proposed, which maximizes informativeness, simplicity and reusability of the metadata and annotations. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004